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Abstract In this paper a deterministic preprocessing 
algorithm is presented, whose output can be given as in¬ 
put to most state-of-the-art epipolar geometry estima¬ 
tion algorithms, improving their results considerably. 
They are now able to succeed on hard cases for which 
they failed before. The algorithm consists of three steps, 
whose scope changes from local to global. In the local 
step it extracts from a pair of images local features (e.g. 
SIFT). Similar features from each image are clustered 
and the clusters are matched yielding a large number 
of putative matches. In the second step pairs of spa¬ 
tially close features (called 2keypoints) are matched and 
ranked by a classifier. The 2keypoint matches with the 
highest ranks are selected. In the global step, from each 
two 2keypoint matches a fundamental matrix is com¬ 
puted. As quite a few of the matrices are generated 
from correct matches they are used to rank the puta¬ 
tive matches found in the first step. For each match 
the number of fundamental matrices, for which it ap¬ 
proximately satisfies the epipolar constraint, is calcu¬ 
lated. This set of matches is combined with the puta¬ 
tive matches generated by standard methods and their 
probabilities to be correct are estimated by a classifier. 
These are then given as input to state-of-the-art epipo¬ 
lar geometry estimation algorithms such as BEEM, 
BLOGS and US AC yielding much better results than 
the original algorithms. This was shown in extensive 
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testing performed on almost 900 image pairs from six 
publicly available datasets. 

Keywords Fundamental matrix • epipolar geometry 
estimation • local features • SIFT 


1 Introduction 

Epipolar geometry estimation from image pairs with 
partial scene overlap is a basic problem in computer vi¬ 
sion. It is used as a component of many important appli¬ 
cations such as vision based robot navigation, structure 
from motion (SfM) and other multiple view geometry 
applications. 

This problem has attracted considerable interest in 
the computer vision community, interest which contin¬ 
ues till this day. Most of the successful algorithms are 
based on an initial step, in which local features are 
detected in both images. For each detected feature a 
local descriptor is computed. These features are then 
matched based on their local descriptors. For each pu¬ 
tative match a prior probability or score is estimated. 
These putative matches and scores are given as in- 


put to the algorithm (Chum et al| 20 

'00 Chum and 

Matas 

2005| Brahmachari and Sarkar, 

2013b; Goshen 

and S 

limshoni, 2008 Raguram et al, 

2013a] Tordoff 

and Murray| 2002). Even though successful algorithms 


have been proposed to address this problem, it still 
remains an active field of research. This is because 
there are several reasons why input given to these algo- 


challenging. As pointed out 
) and tested extensively by 
the angle between the viewing directions 
increases, the appearance of local descriptors changes, 
making them hard to match. Thus, wide baseline im¬ 
ages are hard inputs for the algorithms. Urban scenes 


rithms may be 


by Lowe (2004 


et al (2005), as 
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are also challenging for such algorithms. In such scenes 
features such as for example windows are repeated sev¬ 
eral times. In such cases it is hard for the local match¬ 
ing algorithm to match the window in the first image to 
its corresponding window in the second image. In both 
these types of cases the percentage of correct matches 
(inliers) from the set of putative matches is low. When 
the probabilities are taken into account, the problem is 
that the percentage of correct matches with high prior 
probabilities is low. In these cases, even state-of-the-art 
algorithms tend to fail. 


For that reason, in this paper, instead of trying to 
propose a new epipolar geometry estimation algorithm, 
we present a preprocessing step which is given as in¬ 
put two images and returns a set of putative matches 
with their associated probabilities. Our method was ex¬ 
tensively tested on almost 900 image pairs from dif¬ 
ferent datasets: ZuBuD dataset (Shao et ah 2003), 
BLOGS dataset (Brahmachari and Sarkar 2013a), 
US AC dataset (Ra guram et al| 2013b) and Openl, 
Open2 and Urban datasets (Goldman et ah 2014). Our 
results are much better than those obtained by the stan¬ 
dard initial steps of state-of-the-art algorithms. Con¬ 
sequently, when our output is given as input to them 


(BEEM (Goshen and Shimshoni 2008), BLOGS (Brah- 
machari and Sarkar[ 2013b) and USAC (Raguram et al 


2013a) in this paper) they outperform the same algo¬ 


rithms operating on their regular input. Our output is 
general and can be incorporated within many other al- 


gorithms such as Chum et al 

to 

0 

0 

CO 

(2005); 

Tordoff and Murray ( 

2002). 


(2003); Chum and Matas 


The algorithm starts with standard techniques of 
detecting local features and extracting putative corre¬ 
spondences from them. Using this input we propose a 
new concept consisting of three steps, running from lo¬ 
cal to global. In the local step features are clustered 
together in each image. Clusters with similar features 
from the first image are matched to clusters of features 
from the second image and vice versa. The result of this 
step is a large set of putative matches, most of which 
are incorrect. In the second step we match pairs of spa¬ 
tially close features (2keypoints) in the first image, to 
corresponding pairs of features (found in the first step) 
from the second image. For each 2keypoint match a 
short descriptor is generated, characterizing the qual¬ 
ity of the match. Using a classifier we trained on data 
from several image pairs, for each 2keypoint match the 
probability of being correct is estimated. The highest 
K 2 k P 2keypoint matches are chosen. Here already the 
percentage of correct 2keypoint matches is much higher 
then was recovered in the first step. In the global step, 
the 2keypoint matches are used to generate a large num¬ 
ber of possible fundamental matrices. For each putative 


match from the first step we calculate the number of 
fundamental matrices it supports. Finally we combine 
putative matches generated by standard methods with 
those found by our method, and estimate their proba¬ 
bilities to be correct, using a simple classifier. 

The correlation between these probabilities and the 
ground truth inlier-outlier labels is much higher. As a 
result, when we submit the putative matches and the 
computed probabilities as input to algorithms from the 
guided RANSAC family, much better results are ob¬ 
tained on challenging datasets. For example, the perfor¬ 
mance of all three algorithms when run on the Open2 
dataset (Goldman et ah 2014) increased considerably. 
The number of image pairs they succeeded on increased 
by between 62% and 239% relative to the original per¬ 
formance of those algorithms. This demonstrates the 
fact that our algorithm improves the quality of the in¬ 
put significantly, resulting in better results of the basic 
algorithm. Similar improvement was obtained when the 
algorithm was run as a preprocessing step of USAC on 
the Urban dataset (Goldman et al 2014). 

The paper continues as follows. In Section [2j we 
review related work concentrating mainly on how the 
quality of the input affects the performance of the algo¬ 
rithm. Section [3] presents the overview of our method, 
while the details are given in the next section. Experi¬ 
mental results are presented in Section [5] We conclude 
in Section [6j 


2 Related work 


In reviewing related work we will concentrate on how 
the quality of the input effects the algorithm’s perfor¬ 
mance and not on the various components of the algo¬ 
rithms. 

We will first consider PROSAC QChum and Matas 


2005) and USAC (Raguram et al 2013a). The algo¬ 
rithm is given as input a set of putative matches or¬ 
dered by a score or prior probability. Under this general 
framework the set of putative matches can be ordered 
for example using the distance ratio d r method intro¬ 
duced by Lowe (2004). The models are generated in an 


order consistent with the order of the matches used to 
generate them. Once a model is generated it is verified 
using the Statistical Probability Ratio Test (SPRT). 
The putative matches are tested until the SPRT reaches 
a decision on whether the model is correct or not. 
Thus, when the beginning of the list (matches with high 
scores) is contaminated by a large number of outliers, 
the number of required iterations increases consider¬ 
ably. When the number of iterations of the algorithm 
is limited, this also increases the probability of failure. 
On the other hand, a list consisting of a large number 
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of matches does not effect the running time, since the 
SPRT process usually reaches a decision quite early in 
the verification procedure. 


each match is accompanied by a prior probability (or 
score) that the match is correct. The higher the qual¬ 
ity of this set, the more probable that algorithms from 


In algorithms from the Guided RANSAC fam- 

the guided RANSAC family (Chum and Matas! 2005 

ily (Tordoff and Murray 2002 Goshen and Shimshoni 

Raguram et al 2013a| Brahmachari and Sarkar 2013b 

2008) the subset of matches used for model generation is 

Goshen and Shimshoni 2008) will succeed to estimate 


chosen according to their probability. Thus, the perfor¬ 
mance of the algorithm is similar to that of PROS AC. 
When the list contains a large number of outliers with 
high probabilities, the chances of the algorithm to fail 
are high. 


the epipolar geometry. In this section we will present 
an overview of the algorithm. The details will be given 
in the next section. 

The algorithm, given in pseudo-code in Algorithm 1, 
is described as follows: 


A similar behavior occurs in BLOGS (Brahmachari 


and Sarkar 2013b). There also, in the global search 


step, a model is computed from a minimal subset of 
matches according to a score. Thus, the probability of 
finding a model consisting only of inliers depends on the 
quality of the prior scores. Specifically in BLOGS, a new 
method for putative match ranking was introduced, and 
their scores are referred to as similarity weights {£&}. 

Thus, for all these algorithms, if we can assign more 
accurate probabilities to the putative matches, the algo¬ 
rithms performance should improve considerably. This 
is exactly the goal of the algorithm we suggest here. 

We would also like to review two other algorithms 
which address the problem of matching images contain¬ 
ing scenes with repeated structures. In that case the 
initial stage of the algorithms mentioned above will fail 
to match a feature belonging to repeated structures to 
its correct match in the second image, since it will not 
be able to choose the correct candidate. Thus, this fea¬ 


ture will be discarded. In Generalized RANSAC (Zhang 


and Kosecka| 2006| ), all possible matches of the feature 


to similar (normalized cross correlation above a certain 
threshold) features in the second image are generated 
but are given low probabilities. On the list of puta¬ 
tive matches guided RANSAC is run. Thus, in the case 
when there are not enough non-repeating inliers in the 
list with high probabilities, the algorithm might fail. In 
our previous work (Kushnir and Shimshoni : 2014), a 
special algorithm was developed to deal with buildings 
with repeated features. There also, all possible matches 
of the feature to similar features in the second image are 
generated. The algorithm assumes that in both images 
a planar facade is visible. The algorithm tends to fail 
when this assumption is not satisfied. In this work we 
propose a method which can successfully deal with gen¬ 
eral scenes, including the case of repeated structures. 


3 Algorithm outline 


Algorithm 1 A General Preprocessing Method for Im¬ 
proved Performance of Epipolar Geometry Estimation 
1: Input: images Ji and I 2 
2: Extract SIFT features from /1 and 1 2 
3: Find standard putative correspondences { Xl } and asso¬ 
ciate to them distance ratios { d r } 

4: Find standard putative correspondences {Wb} and asso¬ 
ciate to them similarity weights { tk} 

5: Cluster SIFT features from each image based on descrip¬ 
tor similarity yielding clusters of features 
6: Estimate relative roll angle a exp 
7: for a G [oi exp , 0°] do 

8: Match clusters from the two images, yielding cluster 

pairs 

9: Generate putative correspondences {AT} from the 

members of the matched clusters 

10: Generate all 2keypoints: a pair of features from a main 

feature point and another feature point which is close 
to it in the image 

11: Match 2keypoints from the first image to the 2key- 

points from the second image 

12: Use a classifier to assign probabilities to 2keypoint 

matches 

13: Select the top K 2 kp of 2keypoint matches 

14: Estimate a candidate fundamental matrix from each 

two matched 2keypoints, yielding K 2 kp(K 2 kp — l)/2 
matrices 

15: For each putative match from {AT} count how many 

candidate fundamental matrices (s/m) support it 

16: Assign each putative match from { X } (J {Xl } U {Xb } 

a probability that it is correct 

17: Use these putative matches and their associated prob¬ 

abilities as input to one of the algorithms from the 
guided RANSAC family to yield a fundamental matrix 
and its support 

18: end for 

19: Choose from the two fundamental matrices, the one with 
maximal support 

20: return The fundamental matrix and the list of its inliers 


The algorithm is given as input two images I\ and 
I 2 • The first step of the algorithm (described in Sec¬ 
tion 4.1) is to detect features (SIFT in our case) in 


each image. Those features are used to generate three 
groups of putative correspondences. 


The goal of the algorithm is to generate a set of pu¬ 
tative feature matches between the two images, where 


Following the standard method introduced by [Lowe 


(2004), we find putative correspondences {Xl} and as- 
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sociate to them distance ratios {d r }. The distance ratio 
is used to assign a prior probability for the correctness 
of the match (described in Section 4.1). 

Following the scheme introduced by Brahmachari 
and Sarkar (2013b), we find putative correspondences 
{Xb} and associate to them similarity weights {£&}. 

To resolve problems which occur in image pairs 
which are hard to match, such as scenes which in¬ 
clude repeating elements, we cluster detected features 
(described in Section 4.2). Thus, repeated features or 
features with very similar descriptors are clustered to¬ 
gether. Non-repeating features will belong to clusters 
of size one. Then each cluster from the first image is 
matched to the most similar cluster in the second im¬ 
age and vice versa. The result of this step is a large 
number of putative correspondences {X} (described in 
Section 4.3). A vast majority of them however, are in¬ 
correct. 

An example of how clustering similar features can 
help in the case of a scene which includes repeating 
elements, is shown in Figure [lj For the feature point 
marked by a Red circle in the upper image Lowe] (2004 


finds no match and Brahmachari and Sarkar (2013b) 


find the feature point marked by a Red circle in the 
bottom image, which is incorrect. When using cluster¬ 
ing, as we suggest, the feature point in the upper image 
belongs to a cluster of size one, while in the bottom im¬ 
age two feature points are clustered together, marked 
by Green points. Only in the case of clustering, a cor¬ 
rect match is generated, namely the correct match is a 
member of {X}, but not of { Xl} or {X#}. 

In addition, in order to overcome the problem of fea¬ 
tures looking similar to rotated features (such as win¬ 
dow corners), for the clustering step only, the orienta¬ 
tion of all the SIFT features in the image is fixed in one 
specific direction. Thus, for example different corners 
of a window will not be clustered together. In order to 
determine this direction, we propose to find a rough 
relative roll angle <a, between the two images, from the 
differences of SIFT orientations in { Xl} and {Xb} and 
denote it a exp . Using this approximation and the fact, 
that many images are taken with zero roll angle as a 
prior, all the following steps of our algorithm are re¬ 
peated twice, once for a = a exp and again for a = 0°. 

In order to overcome the problem that the majority 
of the matches in {X} are incorrect we estimate their 
probabilities to be inkers. This is done in two steps: 


In the first step, which is described in Section 4.4 


local information is used. We create a pair of features 
from a main feature point and another feature point 
which is close to it in the image. This pair of features 
is called a 2keypoint. These two features are matched 
to corresponding features in the second image which 



Fig. 1 An example of why using the standard putative 
matches as is, is sometimes insufficient, while clustering might 
help. Images GE000038 and GE000029 were taken from the 
Urban dataset. The feature point marked by a Red circle in 
the upper image is not matched at all by Lowe or mistakenly 
matched by BLOGS to a feature point marked by a Red cir¬ 
cle in the bottom image. When using clustering, for feature 
point marked by the Red circle in the upper image we gen¬ 
erate two putative matches marked by Green points in the 
bottom image, one of which is correct. 


are also close to each other and belong to matching 
clusters. An illustration of a 2keypoint match is shown 
in Figure [2] 

The decision to work with 2keypoints is a compro¬ 
mise between two contradictory preferences: on the one 
hand any combination of features contains more infor¬ 
mation than a single keypoint, which can be used to 
detect inkers more accurately. In general, the larger the 
number of features in the combination, the higher the 
probability that the matched combination is correct. 
On the other hand, since the probability for feature de¬ 
tection is low, the probability for detecting a large num¬ 
ber of features in a combination is even lower. Thus, 
relying on the minimal subset of features is preferable 
due to the difficulties in detecting large combinations 
of features. 

From the set of 2keypoint matches we would like to 
choose a subset, which have a high prior probability to 
be correctly matched. In order to accomplish this, each 
2keypoint match is characterized by a short descriptor. 
The descriptor consists of measures of geometric sim¬ 
ilarity between the two 2keypoints and a count of the 
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(a) (b) (c) (d) 


Fig. 2 An example of the advantage in using 2keypoints. Images (a) FLH00010 and (b) FLH00016 taken from the Open2 
dataset, (c) Zoom-in of (a), (d) Zoom-in of (b). A correct 2keypoint match generated by our method and ranked in the fifth 
place. When single keypoint matches are used both Blue and Red matches are ranked much lower. 


number of possible matches between each 2keypoint in 
one of the images to 2keypoints in the other image. As 
the interdependencies between these characteristics are 
complex, a classifier is trained to learn the probability 
that the 2keypoint is correctly matched. At test time 
each 2keypoint match is assigned a probability and the 
top K 2kp (100 in our application) 2keypoint matches 
are selected. 


An example of how the generation and ranking of 
2keypoint matches can help in dealing with low ranking 
matches, is shown in Figure [2] In that example if only 


single keypoints are used, standard techniques (Lowe 


2004) and (Brahmachari and Sarkar 2013b) would or¬ 


der the Blue match in the 55th and 16th places respec¬ 
tively, whereas the Red match would be placed in a 
161th place or would not be generated at all. On the 
other hand, when 2keypoint matches are used, the cor¬ 
rect 2keypoint is ranked in the fifth place. 

In the second step, described in Section [4~5| global 
information is used. Up until now the analysis we per¬ 
formed has been local in nature. We first matched sin¬ 
gle features and then pairs of close features. In order 
to be able to assign more accurate probabilities to the 
matches, the epipolar geometry constraint, which is 
global in nature, comes into play. In order to generate 
rough estimations of the fundamental matrix we bor¬ 


row an idea from the BEEM algorithm (Goshen and 


Shimshoni 2008), where it is estimated from only two 


matches. In our case from each two matched 2keypoints 
a candidate fundamental matrix is estimated, yielding 
K 2 kp{K 2 k P — l)/2 matrices. As a result of the ranking 
of the 2keypoints, quite a few of them are generated 
from inlier matches. The problem is that even in this 
case they are quite inaccurate. Each of them is sup¬ 
ported (i.e., the Sampson distance computed from the 
matrix and the match is below a certain threshold) by 
a subset of the inliers and quite a few outliers. Instead 
of returning the matrix with the largest support, we 
exploit these matrices in a different way. For each pu¬ 
tative match from {X} we count how many candidate 


fundamental matrices ( sfm ) support it. This number is 
a strong indication of the probability that this putative 
match is inlier. 

Finally we combine the three groups of putative cor¬ 
respondences {X}, {Xl} and { Xb }, in order to achieve 
a set of putative feature matches between the two im¬ 
ages, where each match is accompanied by a prior prob¬ 
ability that the match is correct (described in Sec¬ 
tion 4.6). For that purpose we construct a keypoint 
match descriptor, denoted kpmd , and train a classifier 
on it. The descriptor consists of the local measures of 
similarity, namely {d r } and {t k } and the global mea¬ 
sure {sfm}. At test time each putative feature match 
is assigned a probability. 

As was already mentioned, all the previous steps of 
our algorithm are repeated twice, once for a = a exp 
and again for a = 0°. In order to proceed we run one of 
the algorithms from the guided RANSAC family twice, 
once for each set of putative matches, and choose the 
one with maximal support. 

The output of the entire process is a fundamental 
matrix along with its inlier set. 


4 Algorithm details 

We will now delve into the details of the various com¬ 
ponents of the algorithm. 


4.1 Extraction of putative matches with their distance 
ratios and similarity weights 


The algorithm is given as input two images. As a 
first step we apply feature detection on both im¬ 
ages. In general, any feature detector which returns 
the location, scale, and orientation can be used (e.g.: 
MSER ( Matas et al| 2002]) , BRISK (Leutenegger et al 


2011), ORB ( Rublee et al| |2011 ), SURF (Bay et al 
2006), SIFT (Lowe, 2004|)). In our case we use the im¬ 


plementation of SIFT by |Vedaldi and Fulkerson (2008). 
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Following the standard method introduced by Lowe 


(2004), we find putative correspondences {Xl} based 
on descriptor similarity. The best candidate match for 
each keypoint in the first image, is found by identify¬ 
ing its nearest neighbor in the second image. The near¬ 
est neighbor is defined as the keypoint with maximal 
normalized cross-correlation from the given descriptor 
vector. The probability that a match is correct can be 
determined by taking a distance ratio 


.-i 


d r = 


{m k ) 


COS 


-1 


(m k2 y 


where m & is the similarity to the closest neighbor and 
rrik 2 is the second highest similarity in the second im¬ 
age. All matches for which the distance ratio is greater 
than a certain threshold (in our case 0.9) are rejected. 
This choice of threshold distance ratio is relatively high 
(there are many works where 0.85 or even 0.8 are used) 
and many more matches are kept. This is done since we 
rely on the next steps of our method to deal with them 
correctly (See for example Figure [2]). 

In addition we follow the scheme introduced 
by Brahmachari and Sarkar (20 13b| ), which is a different 
way to define putative correspondences and weights. We 
find putative correspondences {Xb} that exhibit the 
highest similarity measure in both images. This means 
that putative correspondence Xk = (uk,Vk) is a member 
of {Xb} if for a keypoint in the first image Uk its nearest 
neighbor in the second image is and for the keypoint 
in the second image vk its nearest neighbor in the first 
image is Uk- With each such putative match pair, they 
associate a confidence measure which is referred to as 
the similarity weight. They define the similarity weight 
tk for the correspondence Xk as 


4.2 Feature clustering 

Using the former putative matches as is, is sometimes 
insufficient due to the following two problems which 
occur in challenging image pairs. In scenes which in¬ 
clude repeating elements (such as for example windows 
of buildings), the matching process is unable to match 
the repeated features correctly. In image pairs with wide 
baselines, the descriptors of the matching features are 
quite dissimilar and will receive quite low matching 
scores. We will now deal with the first problem. The 
second problem will be addressed in Section |4.4| 

In each image, the features recovered from it in Sec¬ 
tion |4.1[ are clustered based on descriptor similarity. 
In our algorithm we use agglomerative clustering. The 
merging of clusters stops when the similarity measure 
between the closest clusters is below a certain thresh¬ 
old (normalized cross-correlation below 0.85). The re¬ 
sult of this process is a set of clusters of features. Non¬ 
repeating features yield clusters of size one. Each cluster 
is represented by the median descriptor of its members. 




where m/ Cl is the second highest similarity in the first 
image and mk 2 is the second highest similarity in the 
second image as mentioned above. While the third term 
of the tk , i.e. (1 — —) can be interpreted as another 
version of d r , the two other terms are new. The second 
term, i.e. ^1 — is a symmetric complimentary of 

the third one, and it emphasizes that there should be 
no difference between the treatment of first and second 
image. The first term, i.e. (1 — exp -mfe ) is a similarity 
based component. While Lowe in his work did not use 
the absolute distance/similarity as a measure of simi¬ 
larity, in BLOGS the contribution of the absolute sim¬ 
ilarity exists. 


Fig. 3 An example of keypoint clustering with and without 
fixed orientation. Image object0008.view04 was taken from 
the ZuBuD dataset. Black circles: clustering with fixed SIFT 
orientation. Red points: clustering without fixed orientation. 
In the latter case three different types of corners are clustered 
together. 

In order to overcome the problem, which is com¬ 
mon in buildings, of features looking similar to rotated 
features (such as window corners), for the clustering 
step only, the orientation of all the SIFT features in 
the image is fixed in one specific direction. Thus, for 
example different corners of a window will not be clus¬ 
tered together. An example of keypoints clustering with 
and without fixing the orientation of all the SIFT fea¬ 
tures is presented in Figure [3j The Red points show a 
cluster without fixed orientation. In that case three dif- 




















A General Preprocessing Method for Improved Performance of Epipolar Geometry Estimation Algorithms 


7 




(c) 


0.02 



Fig. 4 An example of a relative roll angle a estimation. Images (a) corridorl and (b) corridor2 were taken from the BLOGS 
dataset. The images in (a) and (b) are taken with relative roll of a = 78°. (c) The kernel density estimation of the angle 
difference for (a) and (b) with the maximal peak at a exp = 78°. Images (d) IMG0047 and (e) IMG0106 were taken from 
the Openl dataset. The images in (d) and (e) are taken with zero relative roll, (f) The kernel density estimation of the angle 
difference for (d) and (e) with the maximal peak at a exp = —3°. 


ferent orientations of the window corner are clustered 
together. The Black circles are features clustered to¬ 
gether, when fixing the SIFT orientation. All of them 
are upper left corners of a window. In general, cluster¬ 
ing without fixed orientation of all the SIFT features 
in the image, leads not only to larger clusters which 
can be handled by our method, but to systematic er¬ 
rors when matching features from those clusters. This 
will be further explained later on. 

The approach, of defining one specific orientation for 
all the SIFT features in the image, has been extensively 
used in the literature in the frame of upright SIFT, and 
in this work we generalize this idea to any orientation. 
It is true, that there are many applications such as vi¬ 
sion based robot navigation and structure from motion, 
where all the images are taken with a zero roll angle, 
which justifies the upright SIFT assumption in all the 
images. However, since we do not limit our approach 
to any specific application, we propose to find a rough 
approximation of the relative roll angle <a, between the 
two images, from the existing data. For that purpose 
we calculate the difference of SIFT orientations in each 
putative match found in Section |4.1[ and build a kernel 
density estimation of those angle differences. Although 
the transformation between the two images is perspec¬ 


tive and not affine, the maximal peak a exp of this func¬ 
tion can be used as a rough approximation for a. 

Two examples of a extraction are shown in Figure |4j 
The red arrows were added to indicate the upright di¬ 
rection. In the first row the image pair, taken with a 
relative roll of a = 78°, along with its kernel density es¬ 
timation of the angle difference is presented. The max¬ 
imal peak of this kernel density is precisely a exp = 78°. 
In the second row an image pair, taken with zero rela¬ 
tive roll is shown. The maximal peak of its kernel den¬ 
sity estimation is located at a exp = —3° which is quite 
a good approximation for a. 

Using this approximation and the fact that many 
images are taken with zero roll angle as a prior, we pro¬ 
ceed as follows. When running the algorithm, the first 
image is processed one time using an upright SIFT, 
while the second image is processed in two orientations 
[a eX p, 0°]. Therefore all the following steps of our algo¬ 
rithm, described in Sections |4.3||4.6| are repeated twice, 
once for each orientation. 

4.3 Generation of all keypoint matches 

After both of the images have been processed as de¬ 
scribed above, the next step is to generate pairs of pos¬ 
sible matches between features from the two images. 
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The main problem we have to overcome is when a real 
cluster is segmented into several clusters. We try to 
deal with this problem as follows. Each cluster from 
the first image is matched to the closest cluster from 
the second image using the normalized cross-correlation 
between the cluster representatives. This process is re¬ 
peated when the roles of the images are switched. Thus, 
if a real cluster was over-segmented in one image but 
not in the other we can still match the clusters from 
the two images correctly. If there is over-segment at ion 
in both images, not all possible matches will be found. 

Due to this problem we can not use the distance 
ratio method suggested by Lowe (2004). Because if the 
distances to the closest cluster and the second closest 
cluster are similar, we can not distinguish between over¬ 
segmentation and when both clusters in the second im¬ 
age equally far from the cluster in the first image and 
should not be matched at all. Therefore the closest clus¬ 
ter is always chosen. 



Fig. 5 An example of keypoint clustering and matching. Im¬ 
ages (a) object0076.view02 and (b) object0076.view04 were 
taken from the ZuBuD dataset. The large Red cluster in (b) 
is segmented into two smaller (Green and Yellow) clusters in 
(a). In this example due to cluster matching in both directions 
all the correct matches were found. 


An example of the result of the clustering process 
is shown in Figure [5j The large Red cluster in the sec¬ 
ond image is segmented into two smaller (Green and 
Yellow) clusters in the first image. The Red cluster was 
matched to the Yellow cluster when clusters from the 
second image are matched to clusters in the first image. 
Because the matching is also done from clusters in the 
first image to the closest cluster in the second image, 
the Green cluster is also matched to the Red cluster. 
In this example all of the correct matches were found 
together with many incorrect matches. 

Each pair of clusters which is matched yields a set of 
putative feature matches from the members of the two 
clusters. The result of this step is a large number of 
possible matches most of which are obviously incorrect. 

We will now refer to the systematic errors men¬ 
tioned above when explaining the clustering without a 


fixed orientation of all the SIFT features in the image. 
As was already shown, when the orientation of all the 
SIFT features is not kept fixed, features looking sim¬ 
ilar to rotated features (such as window corners) will 
be clustered together. In that case, when generating 
putative feature matches, there will be matches of the 
same feature (left upper corner in both images for ex¬ 
ample) and there also will be matches of the rotated 
features (left upper corner in the first image matched 
to right lower corner in the second image for example). 
In the next steps of the algorithm, these matches of the 
rotated features will all vote together supporting each 
other and will lead to systematic errors. We therefore 
chose to keep the orientation of all the SIFT features 
fixed during the clustering in Section 4.2, to prevent 
such failures. 


4.4 Generation and ranking of 2keypoint matches 

Most of the feature pairs generated in the previous stage 
are incorrect and therefore in order to be able to use 
them for epipolar geometry estimation prior probabil¬ 
ities have to be assigned to them. This will be done 
in two steps: a step which uses only local information 
which will be described here and a step which uses 
global information described in the next section. 

Recall that each SIFT feature p has besides a de¬ 
scriptor also a scale s(p) and an orientation angle a(p). 
These values will be used in our analysis to make it 
scale and orientation invariant. For each feature point 
p we add a neighboring feature n. The distance be¬ 
tween the features in terms of the scale of p is denoted 
d = \p — n\/s(p). The angle between the vector connect¬ 
ing p to n with respect to a(p) is denoted 0. This pair 
of features will be termed a 2keypoint. A 2keypoint 
{pi,rzi} in the first image is matched to a 2keypoint 
{^ 2 , ^ 2 } in the second image. Naturally, p\ and p 2l and 
ni and n 2 , have to be putative matches. This set of four 
features is illustrated in Figure [6j 

We suggest three methods to choose the neighboring 
pairs {ni, 77 , 2 } close to {pi,P 2 }- The first method simply 
takes n$, i = 1,2 from the K\ closest features around 
Pi. The second method chooses rii from all the features 
within a certain distance K 2 s(pi) in pixels fromp^. This 
parameter is given in units of scale in order to be scale 
invariant. Finally, the third method chooses rii from the 
K% closest features which belong to the same cluster as 
Pi. Experimentally we found that optimal values are 
achieved for K\ = 5, K 2 — 5, and IY 3 = 1. 

In order to estimate the probability that a 2keypoint 
match consists only of inliers, we have to take into ac¬ 
count quite a few factors. Due to the interdependencies 
between these factors and their effects on the estimated 
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Fig. 6 An example of 2keypoints and the parameters used for their matching. Images (a) object0041.view01 and (b) ob- 
ject0041.view04 were taken from the ZuBuD dataset, (c) Zoom-in of (a), (d) Zoom-in of (b). Green and Magenta points show 
the 2keypoints generated by the first method. Blue and Red points show a 2keypoint generated by the third method. 


probability, we construct a 2keypoint match descriptor, 
denoted 2 kpmd, and train a classifier on it. 

The descriptor consists of the following fields: 


2 kpmd = [Ni; N 2 ;dist r ; angles cluster t ; mind]. 


The definitions of the fields are as follows: Ni 
and N 2 are the number of 2keypoint matches that 
the 2keypoints {pi,ni} and {^ 2 ,^ 2 } belong to respec¬ 
tively. The smaller the values, the higher the prob¬ 
ability that the 2keypoint match consists of inliers. 
dist r = min(di/d 2 , c^/di) is the ratio of the distance 
between p\ and n\ in the first image in terms of s(pi) 
to the distance between P 2 and 712 in the second image 
in terms of s(p 2 ). This measure is scale invariant and 
its value should be close to one. The value angled = 
angdiff{@ 1 , 62 ) measures the difference between the an¬ 
gles associated with the two 2keypoints and should be 
close to zero. The field cluster t is equal one if pi and 
ni belong to the same cluster and zero otherwise. Fi¬ 
nally, the distance between the point and its neighbor 
also affects the probability that the 2keypoint match is 
correct. The further the points are, the lower the prob¬ 
ability is. We therefore define mind = min(di, c^). 

In order to train the classifier we chose six im¬ 
age pairs from the ZuBuD dataset. From them four 
were cases that state-of-the-art algorithms were able 
to match, while the other two were more challenging. 
For these image pairs we manually found the ground 
truth matches and trained on the data a C4.5 decision 


tree classifier (Quinlan 1993). This classifier returns 


for each descriptor the probability that the 2keypoint 
match consists of inliers. The training set consists of 
31352 2keypoint matches of which 4102 are inliers and 
the rest are outliers. The quality of the classifier was 
estimated using a 10-fold cross-validation procedure. 
When choosing a classifier, we tried several options such 


as random forest (Breiman, 2001), SVM (Vapnik, 1995) 


and others. The C4.5 decision tree classifier was se¬ 
lected, as the one which not only classifies correctly 



Number of 2keypoint matches ordered by classifier 


Fig. 7 2keypoint match classifier performances on the train¬ 
ing set. Solid curves represent results of the classifier. Dashed 
curves show results without the classifier (the inliers percent¬ 
ages from all the 2keypoint matches). When using the clas¬ 
sifier, there are many more inliers in the top K 2 kp = 100 
2keypoint matches. 


91.6% of the 2keypoint matches, but also gives a max¬ 
imal precision (the proportion of positive results that 
are true positive) of 73.8%, which is the most important 
parameter as will be now explained. 

The 2keypoint matches are then sorted by proba¬ 
bility and the highest K 2 kp (^ 2 /c p = 100 in our imple¬ 
mentation) are chosen. Thus, what is most important is 
that from the top K 2 k P a fair amount of them should be 
inliers (precision). This is evident from the results on 
the training set shown in Figure [ 7 ] The cumulative pre¬ 
cision of the classifier is shown as a function (on a log 
scale) of the number of 2keypoint matches ordered by 
the probability returned by the classifier. Dashed curves 
represent the inlier percentages from all the 2keypoint 
matches, which would be correct if no classifier existed. 
Consider for example the hardest case of the image pair 
(obj066view2,obj066view5). From the top ranked 100 
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2keypoint matches, 41% were inliers, while their per¬ 
centage from all the 2keypoint matches was only 3.84%. 


4.5 Global ranking of matches 

Even though we could use the 2keypoint matches found 
in the previous step as the input for epipolar geometry 
estimation, better results can be obtained by exploiting 


global information. In BEEM (Goshen and Shimshoni 


2008) a method was proposed to generate a rough es¬ 


timate of the fundamental matrix using only two pairs 
of matches instead of 7 or 8. This is done by using the 
similarity transformation between the regions around 
the corresponding features, to generate three additional 
matches for each “real” match. The resulting estimated 
fundamental matrix is quite inaccurate but can be used 


as a basis for local optimization (Chum et al 2003) 


yielding good results. In our case we use two 2keypoint 
matches (four matched points) as the input for esti¬ 
mating the fundamental matrix. One 2keypoint match 
could not be used since all the points from each image 
are too close to each other to generate a meaningful 
result. 

For each of the K 2 kp{K 2 kp — l)/2 pairs of 2keypoint 
matches a fundamental matrix F is generated. All the 
putative matches generated in Section [473] are checked 
to see whether they support F or not. Instead of taking 
the fundamental matrix with the largest support as the 
result of our algorithm, we suggest here a method to ex¬ 
ploit all the generated fundamental matrices. Since we 
assume that many of them were generated from inlier 
2keypoint matches they are therefore rough estimates 
of the required solution. Thus, we measure the support 
of the putative matches. The larger the number of fun¬ 
damental matrices which support the match (s/m), the 
higher the probability that the match is correct. 

An example of the spatial distribution of the in- 
lier matches is shown in Figure [8] Since the fundamen¬ 
tal matrices generated from inliers are quite inaccurate, 
only a small number of matches which lie close to each 
other, are supported by a large number of fundamen¬ 
tal matrices (marked in Blue). The other matches with 
lower support are distributed around this group in an 
irregular manner. 

In Figure [9] we compare results obtained by exploit¬ 
ing global information, to the ones obtained using only 
the 2keypoint matches found in Section |4.4| For that, 
we present the cumulative precision (inliers fraction) 
as a function of the number of putative matches, or¬ 
dered based on their associated probabilities. The solid 
curves show the results obtained by exploiting global 
information, namely cumulative precisions as a func¬ 
tion of number of keypoint matches ordered by their 



Fig. 8 An example of most supported inliers. Image ob- 
ject0092.view02 was taken from the ZuBuD dataset. Matches 
supported by more than 600 sfms are plotted in Blue, 
matches with more than 200 sfms in Green, and the rest 
in Red. 



Number of matches ordered by their probabilities 


Fig. 9 2keypoint match classifier performance vs. 
global ranking of matches performance. Red: images 
object0076.view02 and object0076.view04 were taken from 
the ZuBuD dataset, shown in Figure [5] Green: images 
FLH00010 and FLH00016 were taken from the Open2 
dataset, shown in Figure [2] Blue: images GE000029 and 
GE000038 were taken from the Urban dataset, shown in 
Figure [l] Solid curves show the results of exploiting global 
information, while the dashed curves show the results found 
in Section |4.4| Better results are obtained by exploiting 
global information. 


sfms. The dashed curves show the results found in 
Section [4~4| namely the cumulative precision as a func¬ 
tion of the number of keypoint matches ordered by the 
probability returned by the 2keypoint match classifier. 
Each color represents a different image pair. As can 
be seen in the graph, better results are obtained by 
exploiting global information, in addition to using the 
2keypoint match ranking computed in Section 4.4 Con- 
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sider for example the hardest case of the image pair 
(FLH00010,FLH00016). From the top ranked 100 pu¬ 
tative matches ordered by their s/ms, 11% were inkers, 
while if they were ordered by the probability returned 
by the 2keypoint match classifier only 5% would be in¬ 
kers. 

The result of this step is the set of putative matches 


{X} described in Section 4.3 and their sfms {sfm}. 


4.6 Combining ah the data 

As was stated earlier, the goal of the algorithm is to gen¬ 
erate a set of putative feature matches between the two 
images, where each match is accompanied by a prior 
probability (or score) that the match is correct. For 
that purpose, in Sections [T m 3] we presented a three 
step algorithm, running from local to global and gener¬ 
ating putative matches and their sfms. In addition, in 
Section |4T| we calculated putative match pairs {Xl} 
and { Xb }, which have a large intersection with {X}, 
along with their distance ratios {d r } and/or a similar¬ 
ity weights {t/c}, based on the local features only. In 
order to incorporate those local scores in our method, 
we constructed a keypoint match descriptor, denoted 
kpmd , and trained a classifier on it. 

The descriptor consists of the following fields: 

kpmd = [sfm; d r ; £&]. 

The definitions of the fields are as follows: sfm is the 
number of fundamental matrices which support the 
match, calculated in Section [43] d r and tk are the dis¬ 
tance ratio and the similarity weight described in Sec¬ 


tion 4.1 respectively. For those putative matches that 
miss d r or tk, we attribute ones for d r , and zeros for 
tk . For those putative matches in {Xl} and {Xb} that 
miss the sfm we attribute zeros. In general, the smaller 
the value of d r and the higher the values of sfm and tk, 
the higher is the probability that the putative feature 
match is an inker. 

The general idea behind this step is to improve 
the performance on challenging image pairs, while not 
harming the performance on easy ones. For that pur¬ 
pose the classifier should operate correctly under differ¬ 
ent scenarios. On the one hand, when an image pair is 
challenging, putative match pairs {Xl} and {Xb} are 
insufficient and it should rely on {X} and their sfms. 
On the other hand, for easy image pairs {X} might 
be misleading, while relying on {Xl} and {Xb} works. 
Since there is no way to know a-priori with which sce¬ 
nario we are dealing with, the classifier should highly 
rank both: putative matches with high sfms and miss¬ 
ing (or low) tk and d r , and match pairs with missing 
sfms but with high tk and/or low d r values. 


Using the training set described above, we trained a 
C4.5 decision tree classifier (Quinlan 1993). This clas¬ 
sifier returns for each descriptor the probability that 
the putative feature match is an inker. The training set 
consists of 14255 feature matches from which 1399 are 
inkers and the rest are outliers. Here again a 10-fold 
cross-validation procedure was run. The resulting clas¬ 
sifier correctly classifies 94.9% of the feature matches. 

The result of this step is a set of putative matches 
and their associated probabilities. 


4.7 Epipolar geometry estimation 


As was already mentioned in Section |4.2[ the steps of 
our algorithm described in Sections [Q]|4. 6 1 are repeated 
twice, once for each orientation. Therefore at this stage 
there are actually two sets of putative matches and their 
associated probabilities. To finalize the process we run 
an algorithm from the guided RANSAC family twice, 
once for each set of putative matches, yielding two fun¬ 
damental matrices. The one with maximal support (the 
larger number of inkers) is chosen. 


5 Experiments 

Our method is a preprocessing step for state-of-the-art 
algorithms for epipolar geometry estimation. Therefore, 
in order to evaluate it, we compared the performance 
of three known algorithms BEEM, BLOGS and USAC 
with and without our method. In ah the three cases we 
used the original implementations including ah algo¬ 
rithm parameters, as proposed by their authors, avail¬ 
able on the Internet. We ran experiments with the same 
parameters on ah the results included in this work. 
These parameters were automatically selected to pro¬ 
duce optimal results. 


5.1 Test Data 

To demonstrate the generality of our method, we used 
almost 900 image pairs from six separate publicly avail¬ 
able sources for test data. Each image pair except those 
from the “USAC dataset” came with a small set of 
ground truth correspondences, which are different from 
the SIFT features used to estimate the epipolar geom¬ 
etry. These correspondences were used by the authors 
in their performance evaluation. The mean of roots of 
their Sampson distances served as our quantitative per¬ 
formance measure. The lower the value, the closer the 
proposed solution is to the ground truth. 
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ZuBuD dataset (Shao et al\ 2003): The dataset con¬ 
tains 1005 color images of 201 buildings (5 images per 
building) from Zurich, taken from different viewpoints 
and under different illumination conditions, yielding 


2010 image pairs. In (Kushnir and Shimshoni 2014) 


two subsets of it were used: the “ZuBuD 1 set” of 139 
challenging image pairs (two of which we used for train¬ 
ing as mentioned in Section [4] and the rest for test) and 
the “ZuBuD 2 set” of relatively easy image pairs. This 
way we can check the performance of the algorithm on 
both hard and easy cases. 


BLOGS dataset (Brahmachari and Sarkar, 2013a): 
The BLOGS dataset consists of 20 image pairs, some of 
which have very wide baselines, scale changes, rotations 
and occlusions. 


US AC dataset |Raguram et al , 2013b): In the USAC 
dataset there are 11 image pairs. Since image pairs from 
this dataset come without control points, we manually 
marked 16 correspondences for each image pair, serving 
as the ground truth. 

Since the “BLOGS dataset” and the “USAC 
dataset” are quite small, in our experiments we merged 
them into a single dataset. 


Openl, Open2 and Urban datasets (Goldman et al 


2014): These three datasets that were collected at dif¬ 


ferent locations include 246, 224 and 108 image pairs 
respectively. They were used for testing the SOREPP 


algorithm (Goldman et ah 2015). The datasets present 


challenging scenarios with wide baseline images, small 
overlapping regions, scale changes, and nondescript ob¬ 
jects that make feature matching difficult. Under these 
conditions the inlier fractions are often less than 10%. 


5.2 Qualitative results 



Number of matches ordered by their probabilities 


Fig. 10 A comparison between different methods of rank¬ 
ing. Red: images object0076.view02 and object0076.view04 
were taken from the ZuBuD dataset, shown in Figure [5] 
Green: images FLH00010 and FLH00016 were taken from the 
Open2 dataset, shown in Figure [2] Blue: images GE000029 
and GE000038 were taken from the Urban dataset, shown in 
Figure [I] Solid curves are results of our method. Doted curves 
are based on distance ratio proposed by Lowe. Dashed curves 
are based on similarity weights introduced in BLOGS. Our 
ranking method outperforms the standard ones. 



Lowe/ 

USAC/BEEM 

BLOGS 

Our 

method 

object0076.view02 and object0076.view04 

Number of matches 

415 

355 

5347 

# inliers from top 10 

9 

9 

9 

# inliers from top 100 

67 

61 

92 

Success 

SS 

/ 

SSS 

FLH00010 and FLH00016 

Number of matches 

493 

1161 

7735 

# inliers from top 10 

1 

2 

2 

# inliers from top 100 

7 

7 

11 

Success 


- 

— 

GE000029 and GE000038 

Number of matches 

674 

1038 

9469 

# inliers from top 10 

6 

6 

7 

# inliers from top 100 

13 

16 

38 

Success 

- S 

- 

SSS 


We will start this discussion with a presentation of qual¬ 
itative results on the three image pairs already men¬ 
tioned in Figures [l|2|5| and |9| Figure [10| shows the cu¬ 
mulative precision (inkers fraction) as a function of the 
number of putative matches, ordered based on their as¬ 
sociated probabilities (on a log scale). The results of 
our method are drawn using solid curves. The putative 
matches ranking, based on the distance ratio proposed 
by Lowe and used as an initial step of many state-of-the- 
art algorithms such as USAC and BEEM, are drawn us¬ 
ing dotted curves. The putative matches ranking, based 
on similarity weights introduced in BLOGS, is drawn 
using dashed curves. Different colors represent the dif¬ 
ferent image pairs. One can easily see that our ranking 
method outperforms the standard ones. 


Table 1 A numeric comparison of our method, on several 
examples, to two common techniques for ranking putative 
matches. 


In Table [l] we present numeric comparisons of our 
method on these three examples to these two common 
techniques for ranking putative matches. The number 
of matches we generate is ten times larger than the 
other methods. Even so, as our ranking is much bet¬ 
ter correlated with the probability to be an inlier, as 
was already shown in Figure [lOj the performance is 
not hurt. Numerically speaking, we recover more in¬ 
kers in the top 10 and top 100 ranked matches. This 
should be translated into improved performance of the 
subsequent registration process. To check that we ran 
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ZuBuDl 

ZuBuD2 

BLOGS + USAC DB 

Urban 

Openl 

Open2 

Number of image pairs 

137 

137 

31 

108 

246 

224 

BLOGS 

69.4 

135 

24.4 

35.8 

80.2 

52 

Our method followed by BLOGS 

104.6 

133.8 

26.8 

54.2 

119.2 

90 

Our contribution for BLOGS 

50.7% 

-0.9% 

9.8% 

51.4% 

48.6% 

73.1% 

BEEM 

96 

134.2 

26.2 

46.6 

104.8 

76.6 

Our method followed by BEEM 

107 

134.2 

27.2 

64 

132.4 

124.2 

Our contribution for BEEM 

11.5% 

0 

3.8% 

37.3% 

26.3% 

62.1% 

USAC 

66 

133.8 

24.4 

23.6 

77.6 

27 

Our method followed by USAC 

95.6 

132.2 

26.2 

51 

124.8 

91.6 

Our contribution for USAC 

44.8% 

-1.2% 

7.4% 

116% 

60.8% 

239% 


Table 2 Numeric comparison of general performance with and without our preprocessing step on the standard datasets. 
Number of successful image pairs, with performance measure smaller than 10 pixels is reported. 


BEEM, BLOGS and USAC with and without our pre¬ 
processing method and report on their success which 
will be defined in the next section. The first exam¬ 
ple (object0076.view02,object0076.view04) is an easy 
case and all the algorithms with or without the pre¬ 
processing step succeed on it. The second image pair 
(FLH00010,FLH00016) is so challenging, that although 
the number of inliers was increased by our method, it 
remains unsolved in all the cases. The last image pair 
(GE000029,GE000038) is a typical example of our con¬ 
tribution. Without our method only BEEM found a 
correct fundamental matrix, whereas when our prepro¬ 
cessing step is used, all three algorithms succeed. 

5.3 General performance 

To evaluate the general performance of our method, 
we present a comparison between BEEM, BLOGS and 
USAC with and without our preprocessing step on 
the previously mentioned datasets. We use ground 
truth correspondences and the quantitative perfor¬ 
mance measure, mentioned in Section [5TT| For every al¬ 
gorithm on each image pair we check this performance 
measure and consider it as a success when it is smaller 
than a threshold. 

In Figure [Tl] we present the number of correct epipo¬ 
lar geometry estimations on each set of image pairs as 
a function of the threshold. Although our method is 
deterministic, algorithms for epipolar geometry estima¬ 
tion are not. Therefore for the sake of proper compar¬ 
ison, we show an average over 5 executions of the al¬ 
gorithms. The error bars represent one standard devi¬ 
ation. On the “ZuBuD2 set” and the “BLOGS+USAC 
dataset” all of the checked algorithms perform ex¬ 
tremely well. The performance after our preprocess¬ 
ing is similar. The results for the other datasets are 
dramatically lower for all the checked algorithms. This 
indicates that many of the image pairs are challeng¬ 


ing. The significant improvement due to our preprocess¬ 
ing method can be seen in all the checked algorithms. 
For example, for the “Urban dataset” and the “Open2 
dataset” our preprocessing step improved the perfor¬ 
mance by a factor of two or three for USAC. 

In Table [2] we present numeric comparisons of the 
general performance with and without our preprocess¬ 
ing step. For each algorithm we report its results with 
and without our step, followed by our contribution 
for this algorithm. Our contribution is the percentage 
change, computed as follows: 

result with our step — result without our step 
result without our step 

where results with and without our step are defined 
as the number of successful image pairs, with perfor¬ 
mance measure smaller than 10 pixels. From all the 
checked cases there is a negligible degradation due to 
our method, of one or two out of 137 image pairs, in 
ZuBuD2 dataset for BLOGS and USAC. In all other 
verified cases, our preprocessing step improves perfor¬ 
mance of all the checked algorithms. In general we can 
summarize that our preprocessing algorithm yields bet¬ 
ter results in hard cases and does not degrade on the 
easy ones. 

Another issue worth mentioning is run time. Our 
step is a preprocessing step for any epipolar geometry 
algorithm. As such, the run time can not be shorter 
when our method is used. Moreover, as described in 
Section |4.7[ it requires to run the algorithm from the 
guided RANSAC family twice, once for each set of puta¬ 
tive matches. Therefore, the run time with our method 
is expected to be at least doubled. In Figure [12] we show 
its time overhead. We defined it as: 

run time with our step 
run time without our step ’ 

and chose to present it as a function of our contribu¬ 
tion, discussed previously. Each point shows one al- 
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gorithm on one of the datasets. There are three al¬ 
gorithms and six datasets, resulting in 18 points. It 
appears that there is a negative correlation between 
the time overhead of our method and its contribution, 
which can be explained as follows. Our method takes 
almost constant time regardless of the difficulty of the 
image pair. Epipolar geometry estimation algorithms, 
on the other hand run much faster on easy image pairs. 
Therefore, the larger our contribution, the easier it was 
for USAC/BEEM/BLOGS to finish running. Consider 
the extreme example of running US AC on the “Open2 
dataset”. The standard algorithm succeeded on only 
27 image pairs, while with our preprocessing step 91 
successes were registered (contribution of 239%). This 
increase in performance was achieved at a factor of 2.7 
in running times. As a result, due to the time overhead, 
we would recommend to apply our method only on 
challenging image pairs or when the standard method 
failed. 
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Fig. 12 Time overhead of our method. There is a negative 
correlation between time overhead of our method and its con¬ 
tribution. 


5.4 Analysis 

Our method is a combination of several steps. One 
could naturally ask whether all of these steps are nec¬ 
essary and what are their contributions. To answer 
those questions we tested several variations of the al¬ 
gorithm, which skip parts of our algorithm or replace 
them with standard ones. In Figure [13] we chose to 
present this analysis on the “Openl dataset” while run¬ 
ning US AC. Results on different datasets with other 
algorithms yielded qualitatively similar results. 

The Blue curve is the result of the US AC algorithm 
without our contribution. The Green curve is the result 



Fig. 13 Analysis of different parts of our method. Major 
contribution can be attributed both to the 2keypoint matches 
generation and ranking and to the global ranking of matches. 

when the 2keypoint match ranking is used as the input 
for epipolar geometry estimation. As a result we obtain 
better results than for the original US AC. This indi¬ 
cates that 2keypoint generation is a strong component 
of our method. 

The Cyan curve is a result of altering our method. 
We generate ~ l)/2 fundamental matrices, 

but instead of exploiting all of them, as it is done in 
our algorithm, we present here the fundamental ma¬ 
trix with the largest support as it is usually done. This 
shows really bad results, mostly because each funda¬ 
mental matrix is calculated from four pairs of matches 
instead of 7 or 8, giving quite an inaccurate estimation. 
We believe that local optimization could yield better 
results, but this is beyond the scope of this work. 

The Red curve is based on the putative matches 
{X} and their {sfm}. Using the training set described 
above we converted the sfm into a probability measure 
and used it as the input for epipolar geometry estima¬ 
tion. The resulting Red curve exhibits improved per¬ 
formance with respect to the 2keypoint match ranking. 
Therefore, this step also yields an important contribu¬ 
tion to our method. 

The Black curve shows the result of our entire 
method as is, without any changes. Those results are 
similar to those based on {X} and their {sfm}. This 
is not surprising, recalling the reasoning behind com¬ 
bining all the measures. As it was already mentioned, 
the intent was to improve performance on challeng¬ 
ing image pairs, while not harming the performance 
on easy ones. Therefore, on challenging datasets such 
as the “Openl dataset”, we expect our method to rely 
mostly on {X} and their {sfm} and have a minor con¬ 
tribution from {Xl} and {X#}. This is exactly what 
is exhibited by the similarity between the Black and 
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Red curves. To verify the contribution of this step, 
we analyzed the performance of the algorithms on the 
easier “BLOGS+USAC dataset”. We found that re¬ 
lying on putative matches {X} alone, on average de¬ 
graded the performance relative to our entire method 
for BLOGS/BEEM/USAC by 3.9%, 3% and 7.4% re¬ 
spectively. Since combining all the measures is relatively 
cheap, it is still recommended especially in the easy 
cases, even though its contribution is small. 

Concluding this section we can state that every step 
of our algorithm is necessary and that our good re¬ 
sults can mostly be attributed to the 2keypoint matches 
generation and ranking and to the global ranking of 
matches. 

6 Conclusions 

In this paper we presented a general deterministic pre¬ 
processing step for epipolar geometry estimation algo¬ 
rithms. It generates a set of putative feature matches 
between the two images, accompanied by prior proba¬ 
bilities that each match is correct. The algorithm was 
tested on almost 900 image pairs from six publicly avail¬ 
able datasets. We showed experimentally that the re¬ 
sults obtained by state-of-the-art algorithms which use 
the output of our algorithm outperform the same al¬ 
gorithms which uses the standard input. In general we 
can summarize that our preprocessing algorithm yields 
better results in hard cases and does not degrade on the 
easy ones. This method is general and we believe that 
it can be used as the initial step of all guided RANSAC 
algorithms improving their performance. 
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Fig. 11 Performance comparison between several algorithms with and without our preprocessing step on the standard datasets. 
Solid Red curves: BLOGS. Dashed Red curves: our method followed by BLOGS. Solid Green curves: BEEM. Dashed Green 
curves: our method followed by BEEM. Solid Blue curves: USAC. Dashed Blue curves: our method followed by USAC. 
























